AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Neural Information Processing SystemsFeb-7-2026, 14:54:54 GMT

Meta-Album: Multi-domain Meta-Datasetfor Few-Shot Image Classification

Visual Decathlon [53] gathers 10 diversedatasets.

artificial intelligence, image understanding, machine learning, (15 more...)

Country:

Europe > Netherlands > South Holland > Leiden (0.05)
North America > United States > Arizona > Maricopa County > Scottsdale (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)
(19 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.41)

Neural Information Processing SystemsDec-25-2025, 06:32:58 GMT

Deep Reinforcement Learning at the Edge of the Statistical Precipice

Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field. This work received an outstanding paper award at NeurIPS 2021.

deep reinforcement learning, name change, statistical precipice, (8 more...)

Genre: Personal > Honors (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)

Neural Information Processing SystemsJan-19-2025, 14:05:59 GMT

Deep Reinforcement Learning at the Edge of the Statistical Precipice

benchmark, deep reinforcement learning, statistical precipice, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Beygelzimer, Alina, Dauphin, Yann N., Liang, Percy, Vaughan, Jennifer Wortman

Has the Machine Learning Review Process Become More Arbitrary as the Field Has Grown? The NeurIPS 2021 Consistency Experiment

arXiv.org Artificial IntelligenceJun-5-2023

We present the NeurIPS 2021 consistency experiment, a larger-scale variant of the 2014 NeurIPS experiment in which 10% of conference submissions were reviewed by two independent committees to quantify the randomness in the review process. We observe that the two committees disagree on their accept/reject recommendations for 23% of the papers and that, consistent with the results from 2014, approximately half of the list of accepted papers would change if the review process were randomly rerun. Our analysis suggests that making the conference more selective would increase the arbitrariness of the process. Taken together with previous research, our results highlight the inherent difficulty of objectively measuring the quality of research, and suggest that authors should not be excessively discouraged by rejected work.

artificial intelligence, experiment, machine learning, (18 more...)

2306.03262

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.88)

Industry:

Law (0.46)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)

Diaz, Paula Rodriguez, Afonja, Tejumade, Klemmer, Konstantin, Salama, Aya, Kalavakonda, Niveditha, Azeez, Oluwafemi, Fobi, Simone

Proceedings of the NeurIPS 2021 Workshop on Machine Learning for the Developing World: Global Challenges

arXiv.org Artificial IntelligenceJan-10-2023

These are the proceedings of the 5th workshop on Machine Learning for the Developing World (ML4D), held as part of the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS) on December 14th, 2021.

artificial intelligence, global challenge, machine learning, (3 more...)

2301.04007

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.89)

arXiv.org Artificial IntelligenceNov-22-2022

How do Authors' Perceptions of their Papers Compare with Co-authors' Perceptions and Peer-review Decisions?

Rastogi, Charvi, Stelmakh, Ivan, Beygelzimer, Alina, Dauphin, Yann N., Liang, Percy, Vaughan, Jennifer Wortman, Xue, Zhenyu, Daumé, Hal III, Pierson, Emma, Shah, Nihar B.

How do author perceptions match up to the outcomes of the peer-review process and perceptions of others? In a top-tier computer science conference (NeurIPS 2021) with more than 23,000 submitting authors and 9,000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based on scientific contribution, and (iii) the change in their perception about their own papers after seeing the reviews. The salient results are: (1) Authors have roughly a three-fold overestimate of the acceptance probability of their papers: The median prediction is 70% for an approximately 25% acceptance rate. (2) Female authors exhibit a marginally higher (statistically significant) miscalibration than male authors; predictions of authors invited to serve as meta-reviewers or reviewers are similarly calibrated, but better than authors who were not invited to review. (3) Authors' relative ranking of scientific contribution of two submissions they made generally agree (93%) with their predicted acceptance probabilities, but there is a notable 7% responses where authors think their better paper will face a worse outcome. (4) The author-provided rankings disagreed with the peer-review decisions about a third of the time; when co-authors ranked their jointly authored papers, co-authors disagreed at a similar rate -- about a third of the time. (5) At least 30% of respondents of both accepted and rejected papers said that their perception of their own paper improved after the review process. The stakeholders in peer review should take these findings into account in setting their expectations from peer review.

artificial intelligence, machine learning, perception, (18 more...)

2211.12966

Country:

South America (0.04)
Oceania (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Government (0.46)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJul-11-2022

Lessons learned from the NeurIPS 2021 MetaDL challenge: Backbone fine-tuning without episodic meta-learning dominates for few-shot learning image classification

Baz, Adrian El, Ullah, Ihsan, Alcobaça, Edesio, Carvalho, André C. P. L. F., Chen, Hong, Ferreira, Fabio, Gouk, Henry, Guan, Chaoyu, Guyon, Isabelle, Hospedales, Timothy, Hu, Shell, Huisman, Mike, Hutter, Frank, Liu, Zhengying, Mohr, Felix, Öztürk, Ekrem, van Rijn, Jan N., Sun, Haozhe, Wang, Xin, Zhu, Wenwu

Although deep neural networks are capable of achieving performance superior to humans on various tasks, they are notorious for requiring large amounts of data and computing resources, restricting their success to domains where such resources are available. Metalearning methods can address this problem by transferring knowledge from related tasks, thus reducing the amount of data and computing resources needed to learn new tasks. We organize the MetaDL competition series, which provide opportunities for research groups all over the world to create and experimentally assess new meta-(deep)learning solutions for real problems. In this paper, authored collaboratively between the competition organizers and the top-ranked participants, we describe the design of the competition, the datasets, the best experimental results, as well as the top-ranked methods in the NeurIPS 2021 challenge, which attracted 15 active teams who made it to the final phase (by outperforming the baseline), making over 100 code submissions during the feedback phase. The solutions of the top participants have been open-sourced. The lessons learned include that learning good representations is essential for effective transfer learning.

competition, dataset, participant, (16 more...)

2206.08138

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)
North America > United States > Virginia (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningMar-22-2022

Insights From the NeurIPS 2021 NetHack Challenge

Hambro, Eric, Mohanty, Sharada, Babaev, Dmitrii, Byeon, Minwoo, Chakraborty, Dipam, Grefenstette, Edward, Jiang, Minqi, Jo, Daejin, Kanervisto, Anssi, Kim, Jongmin, Kim, Sungwoong, Kirk, Robert, Kurin, Vitaly, Küttler, Heinrich, Kwon, Taehwon, Lee, Donghoon, Mella, Vegard, Nardelli, Nantas, Nazarov, Ivan, Ovsov, Nikita, Parker-Holder, Jack, Raileanu, Roberta, Ramanauskas, Karolis, Rocktäschel, Tim, Rothermel, Danielle, Samvelyan, Mikayel, Sorokin, Dmitry, Sypetkowski, Maciej, Sypetkowski, Michał

In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with developing a program or agent that can win (i.e., 'ascend' in) the popular dungeon-crawler game of NetHack by interacting with the NetHack Learning Environment (NLE), a scalable, procedurally generated, and challenging Gym environment for reinforcement learning (RL). The challenge showcased community-driven progress in AI with many diverse approaches significantly beating the previously best results on NetHack. Furthermore, it served as a direct comparison between neural (e.g., deep RL) and symbolic AI, as well as hybrid systems, demonstrating that on NetHack symbolic bots currently outperform deep RL by a large margin. Lastly, no agent got close to winning the game, illustrating NetHack's suitability as a long-term benchmark for AI research.

action space, agent, neurips 2021, (17 more...)

arXiv.org Machine Learning

2203.11889

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Finland (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

#artificialintelligenceFeb-5-2022, 07:55:50 GMT

NeurIPS 2021 – 10 Papers You Shouldn't Miss

Authors' TL;DR We use self-supervised play to train artificial agents to communicate by drawing and then show that with the appropriate inductive bias a human can successfully play the same games with the pretrained drawing agent.

benchmark, key insight, learning, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)